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Local, Private, Efficient Protocols 
for Succinct Histograms 

Raef Bassily* Adam Smith* t 


Abstract 

We give efficient protocols and matching accuracy lower bounds for frequency estimation in the local 
model for differential privacy. In this model, individual users randomize their data themselves, sending 
differentially private reports to an untrusted server that aggregates them. 

We study protocols that produce a succinct histogram representation of the data. A succinct his¬ 
togram is a list of the most frequent items in the data (often called “heavy hitters”) along with estimates 
of their frequencies; the frequency of all other items is implicitly estimated as 0. 

If there are n users whose items come from a universe of size d, our protocols run in time poly¬ 
nomial in n and log(rf). With high probability, they estimate the accuracy of every item up to error 
0(\/log(d)/(e 2 n)). Moreover, we show that this much error is necessary, regardless of computational 
efficiency, and even for the simple setting where only one item appears with significant frequency in the 
data set. 

Previous protocols (Mishra and Sandler, 2006; Hsu, Khanna and Roth, 2012) for this task either ran 
in time Q(d) or had much worse error (about {/log; Jdj/Je*nj), and the only known lower bound on error 
was H(l/y / n). 

We also adapt a result of McGregor et al (2010) to the local setting. In a model with public coins, we 
show that each user need only send 1 bit to the server. For all known local protocols (including ours), 
the transformation preserves computational efficiency. 
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1 Introduction 


Consider a software producer that wishes to gather statistics on how people use its software. If the software 
handles sensitive information —for example, a browser for anonymous web surfing or a financial manage¬ 
ment software—users may not want to share their data with the producer. A producer may not want to 
collect the raw data either, lest they be subject to subpoena. How can the producer collect high-quality 
aggregate information about users while providing guarantees to its users (and itself!) that it isn’t storing 
user-specific information? 

In the local model for private data analysis (also called the randomized response modeQ), each individual 
user randomizes her data herself using a randomizer Qi to obtain a report (or “signal”) z t which she sends 
to an untrusted server to be aggregated in to a summary s that can be used to answer queries about the data 
(Figure [T]). The server may provide public coins visible to all parties, but privacy guarantees depend only on 
the randomness of the user’s local coins. The local model has been studied extensively because control of 
private data remains in users’ hands. 



Analysts 


Figure 1: The local model for private data analysis. 

We focus on protocols that provide differential privacy 0 (or, equivalently in the local model, 7 - 
amplification MlOl or FRAPP |[Q). 

Definition 1.1. We say that an algorithm Q : V —> Z is (e, 6)-local differentially private (or (e, <5)-LDP), 
if for any pair v. v' € V and any (measurable) subset S C Z, we have 

Pr [Q(v) G 5] < e £ Pr [Q(u') G 5] + <5. 

The special case with 5 = 0 is called pure e-LDP. 

We describe new protocols and lower bounds for frequency estimation and finding heavy hitters in the 
local privacy model. Local differentially private protocols for frequency estimation are used in the Chrome 
web browser (Erlingsson et al. 13, Fanti et al. HU), and can be used as the basis of other estimation tasks 
(see Mishra and Sandler Ell . Dwork and Nissim li5l). 

We also show a generic result for LDP protocols: in the public-coin setting, each user only needs to send 
1 bit to the server. 

Suppose that there are n users, and each user i holds a value v r in a universe of size d (labeled by integers 
in [d] = {1, ...,d}). We wish to enable an analyst to estimate frequencies: f(v ) = -#{i : Vi = v} . 
Following Hsu et al. Itl4l . we look at summaries that provide two types of functionality: 

• A frequency oracle, denoted FO, is a data structure together with an algorithm A that, for any v € V, 
allows computing an estimate f(v) = A(FO, v ) of the frequency f(v). 

*The term "randomized response” may refer either to the model or a specific protocol; we use “local model” to avoid ambiguity. 
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The error of the oracle FO is the maximum over items v of \f(v) — f(v )|. That is, we measure the 
error of the histogram estimate implicitly defined by /. A protocol for generating frequency oracles 
has error (//, (5) if for all data sets, it produces an oracle with error // with probability at least 1 — /3. 

• A succinct histogram, denoted S-Hist, is a data structure that provides a (short) list of items v \,.... 
called the heavy hitters, together with estimated frequencies ( f(dj) : j £ [k]). The frequencies of the 
items not in the list are implicitly estimated as f(v) = 0. As with the frequency oracle, we measure the 
error of S-Hist by the distance between the estimated and true frequencies, max ); cyi] /('(’) — ./"(o') |. 

If a data structure aims to provide error //, the list need never contain more than 0(1 /rf) items (since 
items with estimated frequencies below // may be omitted from the list, at the price of at most doubling 
the eiTor). 

If we ignore computation, these two functionalities are equivalent since a succinct histogram defines a 
frequency oracle directly and an analyst with a frequency oracle FO can query the oracle on all possible 
items and retain only those with estimated frequencies above a threshold r/ (increasing the error by at most 
r/). However, when the universe size d is large (for example, if a user’s input is their browser’s home page 
or a financial summary), succinct histograms are much more useful. 

We say a protocol is efficient if it has computation time, communication and storage polynomial in n 
and log(d) (the users’ input length). Prior to this work, efficient protocols for both tasks satisfied only (e, 5)- 
LDP for h > 0. Efficient protocols for frequency oracles Ell [14 1 were known with worst-case expected 

error O(\J ), while the only protocols for succinct histograms lfI41 had much worse error — 

about 6 / 1 °g( d F°g( 1 / <5 ) . y er y |‘ CCCn t|y, Fanti et al. ifTTll proposed a heuristic construction for which worst-case 
bounds are not known. None of these protocols matched the best lower bound on accuracy, £1(1/ y/n) l fT4i l. 


1.1 Our Results 


Efficient Local Protocols for Succinct Histograms with Optimal Error. We provide the first polynomial 

time local (e, 0)-differentially private protocol for succinct histograms that has worst-case error 0( )• 

As we show, this error is optimal for local protocols (regardless of computation time). Furthermore, in the 
public coin model, each participant sends only 1 bit to the server. 

Previous constructions were either inefficient ETHT4I (taking time polynomial in d rather than log d), or 

. Furthermore, constructions with communica- 


had much worse error guarantees 
tion sublinear in d satisfied only 


1 at least T>((^) 1/6 ^ 
e, <5) privacy for 6 > 0. 

Our construction consists of two main pieces. Our first protocol efficiently recovers a heavy hitter from 
the input, given a promise that the heavy hitter is unique', that is, all players either have a particular value v 
(initially unknown to the server) or a default value _L. The idea is to have each player send a highly noisy 
version of an error-correcting encoding of their input; the server can then recover (the codeword for) v by 
averaging all the received reports and decoding the resulting vector. 

Our full protocol, which works for all inputs, uses ideas from the literature on low-space algorithms and 
compressive sensing, e.g., lfl2l . Specifically, using random hashing, we can partition the universe of possible 
items into bins in which there is likely to be only a single heavy hitter. Running many copies of the protocol 


2 Mishra and Sandler ED state error bounds for a single query to the frequency oracle, assuming the query is determined before 
the protocol is executed. Known frequency oracle constructions (both previous work and ours) achieve error O(y/\og(l//3)/n) in 
that error model. 
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for unique heavy hitters in parallel, we can recover the list of heavy hitters. A careful analysis shows that 
the cost to privacy is essentially the same as running only a single copy of the underlying protocol. 

Along the way, we provide simpler and more private frequency-oracle protocols. Specifically, we show 
that the “JL” protocol of Hsu et al. lH4l can be made (e, 0)-differentially private, and can be simplified to 
use computations in much smaller dimension (roughly, 0(n) instead of i >(n 4 log d)). 

Lower Bounds on Error. We show that, regardless of computation time and communication, every local 

(e,<5)-DP protocol for frequency estimation has worst-case error as long as 6 -C 1/n. This 

shows that our efficient protocols have optimal error. 

The instances that give rise to this lower bound are simple: one particular item v (unknown to the 
algorithm) appeal's with frequency rj, while the remaining inputs are chosen uniformly at random from 
[d] \ { v }. The structure of these instances has several implications. First, our lower bounds apply equally well 
to worst-case error (over data sets), and “ mi n im ax error” (worst-case error over distributions in estimating 
the underlying distribution on data). 

Second, the accuracy of frequency estimation protocols must depend on the universe size d in the local 
model, even if one item appears much more frequently than all others. In contrast, in a centralized model, 
there are (e, S )-differentially private protocols that achieve error independent of the universe size, assuming 
only that there is a small gap (about j between the frequencies of the heaviest and second-heaviest 

hitters. 

The proof of our lower bounds adapts (and simplifies) a framework developed by Duchi et al. 0 for 
translating lower bounds on statistical estimation to the local privacy model. We make their framework more 
modular, and show that it can be used to prove lower bounds for (e, A)-diffcientially private protocols for 
0 < 8 < 1/n (in its original instantiation, it applied only for 5 = 0). One lemma, possibly of independent 
interest, states that the mutual information between the input and output of a local protocol is at most 
0(e 2 + - lo"(de/6)). In particular, the relaxation with 6 > 0 does not allow one to circumvent information- 
theoretic lower bounds unless 5 is very large. 

1-bit Protocols Suffice for Local Privacy. We show that a slight modification to the compression technique 
of McGregor et al. lf20l Theorem 14] yields the following: in a public coin model (where the server and 
players have access to a common random string), every (e, 0)-DP local protocol can be transformed so that 
each user sends only a single bit to the server. Moreover, the transformation is efficient under the assumption 
that one can efficiently compute conditional probabilities Q(y\x) for the randomizers in the protocol. To 
our knowledge, all the local protocols in the literature (in particular, our efficient protocol for heavy hitters) 
satisfy this extra computability condition. 

The randomness of the public coins affects utility but not privacy in the transformed protocol; in partic¬ 
ular, the coins may be generated by the untrusted server, by applying a pseudorandom function to the user’s 
ID (if it is available), or by expanding a short seed sent by the user using a pseudorandom generator. 

The transformation, following ll20ll . is based on rejection sampling: the public coins are used to select 
a random sample from a fixed distribution, and a player uses his input to decide whether or not the sample 
should be kept (and used by the server) or ignored. This decision is transmitted as 1 bit to the server. Local 
privacy ensures that the rejection sampling procedure accepts with sufficiently large probability (and leaks 
little information about the input). 

1.2 Other Related Work 

In addition to the works mentioned so far on frequency estimation lf2n [T4l f9L 111, many papers have studied 
the complexity of local private protocols for specific tasks. 
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Most relevant here are the results of lfl5l 0 on learning and statistical estimation in the local model. 
Kasiviswanathan et al. lfT31l showed that when data are drawn i.i.d. from a distribution, then every LDP 
learning algorithm can be simulated in the statistical queries model lfl6l . In particular, they showed that 
learning parity and related functions requires an exponential amount of data. Their simulation technique is 
the inspiration for our communication reduction result. 

Recently, Duchi et al. 0 studied a class of convex statistical estimation problems, giving tight (minimax- 
optimal) eiTor guarantees. One of the local randomizers developed in 0 was the basis for the “basic 
randomizer” which is a building block for our protocols. Moreover, our lower bounds are based on the 
information-theoretic framework they establish. 

Finally, our efficient protocols are based on ideas from the large literature on streaming algorithms and 
compressive sensing (as were the efficient protocols of Flsu et al. lfl4l ). For example, the use of hash¬ 
ing to isolate unique “heavy” items appears in the context of sparse approximations to a vector’s Fourier 
representation fl2l (and arguably that idea has roots in learning algorithms for Fourier coefficients such as 
lfl8l ). This provides further evidence of the close relationship between low-space algorithms and differential 
privacy (see, e.g., RU 8 . 2l fl7ll22ll ). 

2 Building Blocks 

2.1 Useful Tools 

In this subsection, we will introduce some of the tools that we will use in our constructions. 

First, we describe a basic randomizer (Algorithm [T]) that will be used in our constructions as a tool to 
ensure that each user generates an e-differentially private report. This randomizer is a more concise version 
of one of the randomizers in Duchi et al. 0. 

Basic randomizer: Our basic randomizer 1Z takes as input either an m-bit string represented by one of the 
vertices of the hypercube {——=, —L} m , or a special symbol represented by the all-zero m-length vector 
0. The randomizer 1Z picks a bit Xj at random from the input string x (where j is the index of the chosen 
bit), then it randomizes and scales Xj to generate a bit Zj E {— c e y/m, c t y/m} (for some fixed c e = 0(l/e)). 
Finally, 1Z outputs the pair (j, zf). As will become clear later in our constructions, the m-bit input of 1Z will 
be a unique encoding of one of the items in V whereas the special symbol 0 will serve notational purposes 
to describe a special situation in our constructions when a user sends no information about its item. 

Theorem 2.1. 1Z has the following properties: 

1. 1Z is e-LDP/or every choice of the index j 

(that is, privacy depends only on the randomness in Step [J]). 

2. For every x E {— -^=, U {0}, 72.(x) is an unbiased estimator of x. That is, E [72,(x)] = x. 

3. 1Z is computationally efficient (i.e., 1Z runs in O (m) time). 

As noted in Step [ 6 ] of the algorithm, we view the output of 'TZ as a vector z E M m of the same length as 
the input vector x. Flowever, the output can be represented concisely by only \log rn\ + 1 bits (required to 
describe the index j and zf). 

In some settings, we may compress this output to just 1 bit. This comes from the fact that the privacy 
of 7 Z holds no matter how the index j is chosen in Step [T] so long as it is independent of the input. (The 
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Algorithm 1 1Z\ e-Basic Randomizer 

Input: m-bit string x E { — ^ 7 ^-, U {0}, and the privacy parameter e. 

1 : Sample j E- [m] uniformly at random. 

2 : if x f 0 then 

3: Randomize 7 -th bit x, of the input x E |—]=, -4=) m as follows: 

J J 1 L y/m 7 V m J 

= f c e mxy w.p. i $ T 

J \ —c e mxj w.p. ^ 


where c e = = O Q). 

4: else 

5: Generate a uniform bit: 2 ^ e- {— c e y/m, c e y/m}. 

6 : return z = (0,..., 0, Zj, 0,..., 0) E {— c e y/m, c e y/m} rn where Zj is in the j-th position of z. (This 
output can be represented concisely by the pair (j, Zj) using [logm] + 1 bits). 


randomness of j is important for utility since it helps ensure that E [7£(x)] = x.) In particular, the random¬ 
ness in the choice of j may come from outside the randomizer: it could be sent by the server, available as 
public coins, or generated pseudorandomly from other information. In such situations, the server receives j 
through other channels and we may represent the output using the single bit describing Zj. 
Jolinson-Lindenstrauss Transform: Next, we go over the well-known Johnson-Lindenstrauss lemma that 
will be used to efficiently construct a private frequency oracle. This idea was originally used to provide 
an inefficient protocol for private estimation of heavy hitters in fl4l (as opposed to providing just a private 
frequency oracle). 

Theorem 2.2 (Johnson-Lindenstrauss lemma). Let 0 < c < 1 and d E N. Let U be any set of t points in 
Wd l and let m > 81 °f^ . There exists a linear map $ : —> M m such that <f> is approximately an isometric 

embedding ofU into M m . Namely, for all x, y E U, we have 

(! - c)||x-y||! < ||$(x — y) || 2 < (1 + c)||x - y||| 
K$x,$y)-(x,y)|<0(c(||x||i + ||y||l)) 

Moreover, any random m X d matrix with entries drawn i.i.d. uniformly from {— - 7 =, —= } enjoys this 

property with probability at least 1 — (3 when m = O ^ log ^ ^. jq ote t i wt j n suc j 1 case> this matrix 

does not depend on the points in U (it only depends on the size ofU). 

Basic tools from coding theory: Finally, we review some basics from coding theory that we will use in our 
efficient construction of private succinct histograms. For reasons that will become clear later, we will define 
a binary code of block length m as a subset of {— —^=} m rather than {0, l} m . 

Definition 2.3 (A binary (2*, m, £)-code). A binary (2 t , m, Cfi-code is a pair of mappings (Enc, Dec) where 
Enc : {1,..., 2 4 } —y {— ^ 7 =, such that the set of the resulting vectors in {— denoted by 

C, satisfies the following constraint: 

min 11 x — x^lo > 2\/C 
x,x'eC 

equivalently, max (x. x.') < 1 — 2 ( 
x.x'eC 
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and Dec : { — {1, 2*} is some decoding rule that maps any given element in ~^} n 

to one of the codewords of C. 

The parameter ( is known as the relative distance of the code. A binary (2/ rn. (/-code can correct 
up to (72-fraction of errors. In other words, a binary (2*, m, ()-code has a decoder Dec such that for any 
codeword x E C and any erroneous version y E { —} m of x whose Hamming distance to x is less 
than m(/2, i.e., 

m 

+ y U )) < ™(/ 2 

3 = 1 

(or equivalently, (x, y) > 1 — (), we have Dec(y) = x. 

Moreover, for any 0 < ( < 1/2, there is a construction of a binary (2*, 0(f), ()-code with an efficient 
encoding and decoding algorithms. In fact, there are several constructions in coding theory literature that 
satisfy this property (for example, see fl3l ). 


2.2 A Private Frequency Oracle Construction 

We give here an efficient construction of a private frequency oracle based on Johnson-Lindenstrauss pro¬ 
jections. Our construction follows almost the same lines of the construction of fl4l . Our version differs in 
three respects. First, we use the construction only to provide a frequency oracle as opposed to identifying 
and estimating the frequency of heavy hitters. For that purpose, the construction is computationally effi¬ 
cient. The second difference is in the local randomization step at each user. Here, each user i E [n] uses an 
independent copy of the basic randomizer 7 Zi given by Algorithm[I](as opposed to adding noise as in fPfll ). 
This gives us pure e-differential privacy guarantee (as opposed to (e, 5) in The third difference is that 
computations are earned out in much smaller dimension, namely 0(n) as opposed to H(n 4 log(d)) in lfl4l . 

Given our private frequency oracle, we give a simple efficient algorithm that, for any given input v E V, 
uses the frequency oracle to obtain a private estimate f(v) of the frequ ency f(v ) of the item v. 


Let <f> denote an m x d random projection matrix as in Theorem 


2.2 


with m = 1 °g( rf + i n°s( 2 // j ) an d 


7 = 


iog (2 d/p) w p crc p > o is an input parameter to our algorithm that, affects the confidence level of 


our eiTor guarantee (but not the privacy guarantee). In our protocol below, we assume the existence of a 
source of randomness GenProj that on input integers m, d > 0 generates an instance of <b. The output <I> 
of GenProj is assumed to be public, that is, shared by all parties in the protocol (the users and the server). 
We note that there are efficient constructions for GenProj that generates a succinct description of $ that 
is much less than rrul when the columns of the projection matrix $ are k -wise independent for k « d. 
For our construction, it suffices for the columns of $ to be n-wise independent (namely, it will still satisfy 
the conditions in Theorem |2.2[ ). Hence, the amount of randomness generated by GenProj (describing <f>) is 
0{mn ) in such case. 

We denote the i-th standard basis vector in W l by e, . The construction protocol of a private frequency is 
described below in Algorithm [2] 

Note that m = 0{n). Hence, the length of each user’s report is 0(log(m)) = 0(log(n)). Moreover, as 
noted above we only need 0(mn) = 0(n 2 ) random bits to generate <b, thus, GenProj runs in time 0(n 2 ). 


Also, each basic randomizer is efficient, i.e., runs in 0(m) = 0(n) time (Part 3 of Theorem 2.1 1 . Hence, 
one can easily verify that the construction is computationally efficient. 

In Algorithm [3] below, we show that, for any given fixed item v E V, FO can be used to efficiently give 
an estimate fv of f(v). 
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Algorithm 2 PROT-FO: e-LDP Frequency Oracle Protocol 

Input: Users’ inputs {vi G V : i G [n]}. the privacy parameter e, and the confidence parameter P> 0- 

2: m •<— 1 °g( rf + 1 )I°g( 2 /f ? ) , 

7 2 

3: $ GenProj(m, d). 

4: for Users i = 1 to n do 

5: User i computes z* = 7?,,; (fbe^, e). 

6: User i sends z, to the server. 

7: Server computes z = - z *- 

8: FO G- (<&,z). 

9: return FO 


Algorithm 3 Afo : e-LDP Frequency Estimator Based on FO 

Input: Data structure FO = (<3?, z) (the frequency oracle), an item v G V whose frequency to be estimated. 
1: return f(v) = ($e„,z). 


The privacy and utility guarantees of the frequency oracle constructed by PROT-FO above are given in 
the following theorems. 

Theorem 2.4 (Privacy of FO). The construction of the frequency oracle FO given by Algorithm [2] is e- 
differentially private. 


Proof The proof follows directly from part 1 of Theorem 2.1 


□ 


Theorem 2.5 (Error of FO). Let e > 0. For any set of users items {v\, ...,v n } and any (3 > 0, the error due 
to FO constructed by A Igorilhm [2] is bounded as 


ERR (f; FO) = max I f(y) — fiv) I = O 
v£V 



log(d//3) \ 

n I 


with probability at least 1 — /3 over the randomness of the projection $ and the basic randomizers IZj. i G [n], 
where f(v) denote the output of Procedure Afo (given by A Igorilhm [d] above) on an input v. 

Proof The proof relies on the good concentration behavior of the inner product between the aggregate 
measurement z and any vector y G { — } T ". This is formalized in the following claim. 

Claim 2.6. Let (3 > 0. Let xi,..., x n , by G {— 7 =, - 7 =} m and let Zj = e) where lZi,i G [n] are 

y m WITT, 

independent copies of our basic randomizer (Algorithm^ 7]). Then, with probability at least 1 — (3, we have 


1 




To prove this claim, we first observe that (zj, y), i G [n] . is a sequence of independent random vari able s 
taking values in {— O Q) , 0 (7) }■ Also, from the second property of our basic randomizer (Theorem |2.l[ ), 
we have E [(z,;. y)] = (x,, y). Putting these together, then by Floeffding’s inequality our claim follows. 
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To prove the theorem, observe that the error of construction FO can be written as 


max) f{v) — f(v) = max 
uGV v£V 


= max 
vev 


< max 
vev 


1 n 

(z,<f>e„) - (~y^e v .,e v 

n z J 

2 = 1 

/ 1 \ l •* 

(z - E[z], $e„) + (<F - 'Y^ j e Vi , <f>e„) - (-^e^.e, 


n 


J2(*i - E[z i], 


2 — 1 


2=1 


+ max 


2=1 


( 1 ) 


l n \ i n 

H-Z e ^l 


2=1 


2=1 


( 2 ) 


where 0 follows from Part 2 of Theorem |2.1| and ([2]) follows from the linearity of the inner product and 
the triangle inequality. 

Now, by Johnson-Lindenstrauss lemma (Theorem |2.2[), with probability at least 1 — p/2, the sec ond 


term is bounded by 7 - 0 ( 1 ) = 0( 


- ml. / lo g(<V/3) 


). We next consider the first term. Fix v G V. By Claim 


above, with probability at least 1 — ^,we have 

1 n 

- V\z i- E[zj], <f>e, : 

71 ^^ 


2.6 


2=1 


<o -J logWl3) 


n 


Hence, by the union bound, with probability at least 1—/?/2, the first term of (2 1 is bounded by O ^7 \J 
Thus, with probability at least 1 — /5, the error of FO is bounded as in Theorem |2.5| □ 

Note: The above upper bound is shown to be tight by our result in Section [5] 


3 Efficient Error-Optimal Construction of Private Succinct Histograms 


In this section, our goal is to construct an efficient private succinct histogram using the private frequency 
oracle given in the previous subsection together with other tools. In Section [3T| we first give a construction 
for a simpler problem that we call the unique heavy hitter problem. Then, in Section 3.2 we give a reduction 
from this problem to the general problem. 


3.1 The Unique Heavy Hitter Problem 

In the unique heavy hitter problem, we are given the promise that at least an 7 ] fraction of the n users hold 
the same item v* for some v* E V unknown to the server (here rj is a parameter of the promise), and that all 
other users hold a special symbol _L, representing “no item”. 

Our goal is to obtain an efficient construction of a private succinct histogram under this promise, for as 

small a value 77 as possible. We will take 7 / to be at least ( — for a universal constant C > 0. Our 

protocol is differentially private on all inputs. Under the promise, with high probability, it outputs the correct 
v* together with an estimate f(v*) of the frequency f(v*). 

The main idea of the protocol is to first encode user’s items with an error-correcting code and randomize 
the resulting codeword before sending it to the server. The redundancy in the code allows the server learn 
the unknown item v* from the noisy reports. 

























We require an efficiently encodable and decodable binary (d, m, C)-code (of d codewords, block length 
m, and relative distance Q where m = ()(\og(d)) with constant rate (so that m = ()(\og(d))) and constant 
relative minimum distance £ 6 (0,1/2), say £ = 1/4. (We do not require the rate or relative distance ( 
to be optimal; these quantities will affect the constants in the error of our construction but not the asymptotic 
behavior.) There are several known constructions of such codes in the literature (see fl3l for examples). Fix 
one such code, denoted code(d, m, £), with associated encoder c and decoder Dec. The code is part of the 
protocol and so is known to all parties. For convenience, we represent codewords as points in the unit-radius 
hypercube {—77=, 


Each user i first encodes its item Vi to obtain x* = c (vi) £ {— ^=, -/=} m , then runs the basic random¬ 
izer 7 Zi (given by Algorithm[T|) on x, to obtain the report z*. Users that have no item, i.e., users with input 
_L, feed the zero vector x, = 0 to the basic randomizer. 

The server aggregates the reports by computing z = ^ 1 z *> and then decodes z to obtain the 

encoding x of u*. One may not be able to feed z directly to the decoding algorithm Dec of cod e(d, m, Q 
since z will not, in general, be a vertex of the hypercube {—-4=, —L} m . Instead, the server first rounds 
the aggregated signal z to the nearest point y in the hypercube before running Dec. We argue that the 
combination of noise from randomization and the rounding step produces a vector y that is sufficiently 
close to x with high probability. 

Algorithm [4] precisely describes our construction for the promise problem. The protocol is computa¬ 
tionally efficient, i.e., the total computational cost is poly(log(d), n) since code(d, m, () = (c, Dec) runs in 
time poly(log(d)) and each basic randomizer TZ % runs in time 0(log(d)). In fact, the computational cost at 
each user does not depend on n. Also, we note that the users’ reports are succinct, namely, the report length 

is O (log (log(d))) bits. 


Algorithm 4 PROT pp -S-Histpp: e-LDP Succinct Histogram Protocol under the Promise 

Input: Users’ inputs {Vi £ V U {_L} : i £ [n]}, the privacy parameter e, and the confidence parameter 

P > 0 . 

1 : for Users i = 1 to n do 

2 : If Vi 7 ^ _L, then user i encodes its item: x ? ; = c (vi). Else, user i sets x* = 0. 

3: User i computes its private report: z$ = IZj (x ( . e). 

4: User i sends Z j to the server. 

5: Server computes z = - ^/ =1 z 

6 : Server computes y by rounding z to That is, for each j = 1,..., m, 

( if Zj > 0 , and 

y. _ ) vm where Zj denotes the j-th entry of z. 

— j= otherwise, 

t s/rn 

7: Server decodes y into an estimate for the common item v = Dec(y) 
and computes a frequency estimate f{v) = (c(v),z). 

8: return (v, /(D)). 


Theorem 3.1 (Privacy of S-Histpp). The construction of the succinct histogram S-Histpp given by Algo- 
rithin [4 ] is e-ciijferentially private. 

Proof. Privacy follows directly from the e-differential privacy of the basic randomizers 'JZ,. i £ [n] (Part 1 
of Theorem |2.1[ ). □ 
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To analyze utility, we first isolate the guarantee provided by the rounding step. Let § m = {w E M m : 
||w|| 2 = 1 } denote the m-dimensional unit sphere. 

Lemma 3.2. Let z E § m be such that there is a codeword o/code(d, rn, (), x E C, with (z, x) > 1 - C/4. 
Let y be vector in the hypercube {—-j=, ^=} m obtained by rounding each entry Zj ofz to sign(zj) / y/rri. 
Then the Hamming distance between y and x is less than mC,/2, i.e., Y1JL 1 1 (dj 7^ Xj) < mC,/2. 

Proof. Since z and x are unit vectors, the distance ||z — x ||2 satisfies 

ll z - x ll! = Il z ll 2 + \Ml - 2 (z,x) < C/2. 

The vectors x and y disagree in coordinate j only if \zj — xf > ^=. There can be at most m (/2 such 
coordinates, since each contributes at least — to ||z — x^. Thus, the Hamming distance between y and x is 
YljL 1 1 (2/y 7^ Xj) < m(/2 completing the proof. B 


Theorem 3.3 (Error of S-Histpp under the promise). Let e > 0. Suppose that the conditions in the 
above promise are true for some common item v* E V. For any f3 > 0, there is a setting of // = 

0^4 in the promise such that, with probability at least 1 — f3, Protocol PR0T pp -S , -///.?fpp 

publishes the right item v* and the frequency estimation error is bounded by 


max|/(n) - f(y) \ 
v£V 


o 



log(l//3) 


Proof Consider the conditions of the promise. Let v* E V be the unique heavy hitter (occurring with 
frequency at least 77 ). Let f > 0. Given Lemma 3.2 to show that the protocol above recovers the correct 
item v* with probability at least 1 — /3/2, it suffices to show that, with probability at least 1 — 0/2, we have 


(c(n*), ipn-) > 1 — C/4- 

Il z l|2 

Note that the rounding step (Step [ 6 ]in Algorithm[4]) would produce the same output whether it was run with 
z or its normalized counterpart z/ 1 |z 112 - 
By the promise, we have 



1=1 


iX>(c(»*)) + - ]T fti(o) 

n n 

i£T iG[n]\T 


where T denotes the set of users having the item v*. (Note that -4C = f(v*) > rj). 

First, we consider ||z|| 2 - Since for every i E [n], 72, is unbiased (Part 2 of Theorem 2.1 1 , we have 
||E[z]|| 2 = f(v*). Using the triangle inequality, we get ||z || 2 < f(v*) + ||z — E [z] || 2 . Next, we obtain 
an upper bound on ||z — E [z] || 2 . Note that z, . i = 1, ,.,n, are independent and that for every 1 E [n], 
_ q ( Vff j probability 1. Applying McDiarmid’s inequality lfl9l . with probability at least 


2 


1 — /3/4, we have |jz — E [z] || 2 < O 1 1 

bounded by 


m log(l/ P) ) 
n 




Thus, with probability at least 1 — /3/4, ||z || 2 is 


</(«" 


?nlog(l//3) \ 



( 3 ) 
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Next, we consider (c(v*),z). Observe that (c(v*),z) = 

f( v *)+ 1 J2( c (v*)’ n i(c(v*))~ c (v*))+ 1 (c(u*),7£i(0)> 

n n z —' 

iG[n]\T 

By the tail properties of the distribution of the second and third terms (following Claim [276] ), we can show 
that with probability at least 1 — /?/4, we have 


<c(n*),z> > f(v*) - O (4) 

Putting ([3]) and (|4]l together, then, with probability at least 1 — (5/2, we have 

(cK), pip) >- > . \ 

l|z| ' 2 + 

where we use the fact that 77 < f(v*) and assume that the numerator in the right-hand side is positive. 

Since m = 0(log(d)), then there is a constant a £ that depends on £ such that if we set 7] = j l " s - 1 ^- 1 

then the above ratio is greater than 1 — C/4. This proves that there is a setting of // = O ^7 

such that construction PROT pp -S-Histpp outputs v = v with probability at least 1 — (5/2. 

Now, conditioned on correct decoding, for all v 7 ^ v*, the estimate /(/;) is implicitly assumed to be zero 
(which is perfectly accurate in this case). Thus, it remains to inspect f(v*). Observe that 


'cP 

* 

1 

'cT 

* 

II 

<c K),|4-> -/(«*) 

< 

iy>(m,K.(c(m)-c(„*)> 

+ 

\ E W»*),Ki(0)) 


ll z l|2 


ieT 


ie[n] \T 


Again, by the tail properties of the sums above, with probability at least 1 — we conclude that | f(v*) — 

Therefore, with probability at least 1 — (3, protocol PROT pp -S-Histpp recovers the correct common 
item v* and the estimation error that is bounded by O ( \ J ) ■ □ 


3.2 Efficient Construction for the General Problem 


In this section, we provide an efficient construction of private succinct histograms for the general setting 
of the problem using the two protocols discussed in the previous sections as sub-protocols. Namely, our 
construction uses an efficient private frequency oracle like FO given in Section 2.2 and an efficient private 


succinct histogram for the promise problem like S-Histpp given in Section 3.1 Our construction is mod¬ 


ular and does not depend on the internal structure of the construction protocols or the data structure. Our 
construction is efficient and succinct as long as the construction of such objects is efficient and succinct. 
Moreover, our succinct histogram is shown to be error-optimal if the aforementioned objects satisfy the 
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guarantees of Theorems 2.4 and [23] (for the frequency oracle) and Theorems 3.1 and 3.3 (for the succinct 
histogram under the promise). 

In the promise problem, the main advantage was the lack of interference from the users who do not 
hold the heavy hitter v* in question. The main idea here is to obtain a reduction in which we create the 
conditions of the promise problem separately for each heavy hitter v* E V such that the extra computational 
cost is at most a small poly(n) factor. To do this, we hash each item v E V into one of K separate parallel 
channels such that users holding the same item will transmit their - reports in the same channel. Each user, 
in the remaining K — 1 channels, will simulate an “idle” user with item _L as in the promise problem. By 
choosing I\ sufficiently large, and repeating the protocol in parallel for T time^J we can guarantee that, 
with high probability, every heavy hitter v* E V gets assigned to an interference-free channel. Hence, by 
using an error-optimal construction for the promise problem like S-Histpp in each one of these channels, we 
eventually obtain a list of at most KT items such that, with high probability, all the heavy hitters will be on 
that list. However, this list may also contain other erroneously decoded items due to hash collisions and we 
do not know which items on the list are the heavy hitters. To overcome this, in a separate parallel channel of 
the protocol, we run a frequency oracle protocol (like PROT-FO) and use the resulting frequency oracle to 
estimate the frequencies of all the items on that list, then output all the items whose estimated frequencies 
are above 77 together with their estimated frequencies. 

For the purpose of this construction, it suffices to use a pairwise independent hash function. A family of 
functions TL = {h s \h s : V —> [K], s E {(). 1 }■} is said to be pairwise independent if for any distinct pair v / 
v' E V and any values j, k E [K\, a uniformly sampled member of such a family h s ,s {0, 1 } f , satisfies 
both h s (v) = j and h s {v') = k simultaneously with probability There are efficient constructions of 
pairwise independent hash families with seed length 1 = 0 (max (log(cf), log(iT))). In our construction, we 
can use any instance of such a family as long as it is efficient. Our hash family (or simply hash) is denoted 
by Hash that, for a given input seed s and an item v E V, returns a number in [K\. All users and the server 
are assumed to have access to Hash. Moreover, we use a source of public randomness RndGen that, on an 
input integer l > 0, generates a random uniform string from {(). 1}' that is seen by everyone]^] 

The parameters of our hash family are K = n 3 4 / 2 and 1 = 0 (max (log(d), log(n))). Our construction 
protocol PROT-S-Hist is given by Algorithm [5] below. 

It is not hard to see that the total computational cost of this construction is 


O (ra 3 / 2 log(l//3)costpp 4- costpo + ncost_ 4 F 


where costpp, costpo, and cost_ 4 ro are the computational costs of the promise problem sub-protocol, the 
frequency oracle sub-protocol, and the algorithm that computes a given frequency estimate, respectively. 
Hence, for our choice of the sub-protocols above, one can easily verify the overall worst case cost of our 
construction 0(n 5 / 2 poly(log(d)) log(1 //3)). 

The report length of each user is now scaled by KT compared to that of the promise problem, that is, 
O (n 3 / 2 log(l//3) log (log(d))). In the next section, we will discuss an approach that gets it down to 1 bit at 
the expense of increasing the public coins. 

Our construction here relies on public randomness represented by the T fresh random strings (seeds) 
of each of length 2 max (log(d), log(n)) which for the setting we considers 0(log(d)). Hence, the total 
number of public coins needed is O (log(l/ f3) log(<T)). 

3 That is, the total number of parallel channels is KT. In each group of K channels, a fresh hash seed is used. 

4 We may also think of RndGen as being run at the server which then announces the resulting random string to all the users. 

3 We assume d^> n for our definitions of computational efficiency and succinctness to be meaningful. 
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Algorithm 5 PROT-S-Hist: e-LDP Efficient Protocol for Succinct Histograms 

Input: Users’ inputs {vi E V U { _L} : i E [n]}, the privacy parameter e, and the confidence parameter 

/?> 0 . 

1: List e- 0 (initialize list of heavy hitters to the empty set.) 

2: £ E- 2max(log(d),log(n)) ; K E- n 3 / 2 . 

3: flog(3//3)l 

4: for t = 1 to T do 
5: s* 4- RndGen(t'). 

6: for Channels k = 1 to K do 

7: for Users i = 1 to n do 

8: If Hash(s t , Vi) / k, set v[ e- _L. Else, set n' E- Vi 

9: v E- PROT pp -S-Histpp }{/(/,, ; 2T 6 +1 ; {i.e.,run PROT pp -S-Histpp on the modified set 

of users’ items to obtain an estimate v of the possibly unique item transmitted in the A:-th channel.} 


13: 

14: 


If v f List, then add v to List. 


10 : 


11: LO E- PROT-LO 


v n} j 2T+1 > 

obtain the frequency oracle LO. } 

12: for v E List do 

f{v ) E- ylpo (FO, v). {^4po is the frequency estimator given in Section 

If f(v) < 2T + 1 lo s( rf ) t°g( 1 Z/ 3 ) ^ remove $ from List. 


{i.e., run PROT-LO on the original set of users’ items to 

} 


2.2 


15: return 


{(v,f(v)) : v E List}. 


Theorem 3.4 (Privacy of PROT-S-Hist). Protocol PROT -S-Hist given by Algorithm [3] is e-differentially 
private. 

Proof. Lirst, observe that Protocol PROT-S-Hist runs Protocol PROT pp -S-Histpp over I\T channels and 
runs Protocol PROT-LO once over a separate channel. In the first KT channels, for any fixed sequence of 
the values of the seed of the hash function, the reports of each user over these channels are independent. 
Moreover, each user gets assigned to exactly T channels. Lix any user i and any two items v t , E V. 
Using these observations, one can see that, for any fixed sequence of values of the seed of the hash over 
these KT channels, the distribution of the report of user i when its item is v t differs from the distribution 
when the user’s item is v[ in at most 2 T channels, and in each of these channels, the ratio between the two 
distributions is at most e 2T + 1 by the differential privacy of PROT pp -S-Histpp (note that the input privacy 
parameter to in Step Mis e 27 ’^ 1 ). Hence, by independence of the user’s reports over separate channels, the 

1—1 2 Te 

corresponding ratio over all the KT channels is at most e 2T + 1 . In the separate channel for the frequency 
oracle protocol, again by the differential privacy of PROT-LO, this ratio is bounded by e 27 + 1 . Putting this 
together with the argument in the previous paragraph completes the proof. □ 

Theorem 3.5 (Error of PROT-S-Hist). For any set of users’ items {iq,..., v n } and any 3 > 0, there is 
a number r] = O such that, with probability at least 1 — j3, Protocol PROT -S-Hist 

outputs < (v,f (v)j : v E LIST > where LIST = {v* E V : f(v*) > rj} (i.e., a list of all items whose 
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frequencies are greater than q), and the error in the frequency estimates satisfies 


loga (1//3) y iog(rf) ^ 

(As mentioned before, the frequency estimates of items v (f LIST are implicitly zero.) 


max | /(f) - f(v) | = O 
vev 


Proof. Let U denote the set of the users’ items { v\..... v n }. We first show that for the setting of K and T 
in Algorithm [ 5 ] running PROT pp -S-Histpp over KT channels will isolate every heavy hitter (i.e., every 
item occurring with frequency at least rj) in at least one channel without interference from other items. Let 
Heavhit = {n* E V : f(v*) > i]} denote the set of the heavy hitters. Note that |Heavhit| < 

Claim 3.6. If rj > 2T + l logL /fi}, p 2en w pp probability at least 1 — /3/3 (over the sequence of the 

seed values si,..., st of Hash), for every heavy hitter v* E Heavhit there is t E [ T] such that Hash(s t , v*) 4 
Hash(s t , m) for all v E U \ {n*}. 

First, we prove this claim. Fix v* E Heavhit. Let t E [T], Let Coll s< (n*) = |{f E [n] : Hash(s, v*) = 
Hash(s, V|), Vi f v* }| denote the number of collisions between v* and users’ items that are different from 
v* when the hash seed is st- First, we bound the expected number of such collisions: 

E[Coii,,(»*)] < £ = 


Hence, by Markov’s inequality, with probability at least 1 — 4^, Co11 , s ., (v*) = 0. Hence, with probability at 

least 1 — ^ (^ 7 ) 4 1 — /5/3, for each v* E Heavhit, there exists t E [T] such that Coll St (w*) = 0, which 

proves the claim. 


This implies that with probability at least 1 — (5/2,, there is a set W C [KT] of “good” channels whose 
size is the same as the number of heavy hitters such that each heavy hitter v* E Heavhit is hashed into one 
of these channels without collis ions . Conditioned on this event, let w E W and let vf denote the heavy 
hitter in channel w. By Theorem 3.3 running Protocol PROT pp -S-Histpp over channel w yields the correct 


estimate of v*. with probability at least 1 — ^ (Step [ 9 ] of Algorithm [5J). Hence, with probability at least 
1 — (5/ 3, all estimates of PROT pp -S-Histpp in all channels in W are correct. Hence, at this point, with 
probability at least 1 — LIST contains all the heavy hitters in Heavhit among other possibly unreliable 
estimates of PROT pp -S-Histpp for the channels in [KT] \ W. 

Now, conditioned on the event above, by the error guarantee of FO given by Theorem |2. 5 [ with proba¬ 
bility at least 1 — /3/3, the maximum error in the frequency estimates of all the items in LIST (Step 13 of 
Algorithm [ 5 }, denoted by Err (List), 


is 


O 


2T + lJl og(d/ff \ _ Q Aog(l//?) / log(d//?A 


Hence, all those items in LIST with actual frequencies greater than 


a 2T + 1 
V = - 


log (d) log(l /P) 


n 


+ Err (List) = O 


log2(1/4) 
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will be kept in List whereas all those items with frequency less than 77 will be removed. Note that the 
frequency estimates that are implicitly assumed to be zero of those items that are not on the list cannot have 
eiTor greater than rj since their actual frequencies are less than //. This completes the proof. □ 

4 The Full Protocol 

4.1 Generic Protocol with 1-Bit Reports 

In this section, we give a generic approach that transforms any private protocol in the distributed setting 
(not necessarily for frequency estimation or succinct histograms) to a private distributed protocol where the 
report of each user is a single bit at the expense of adding to the overall original public randomness a number 
of bits that is O(nr) where r is the length of each user’s report in the original protocol. As mentioned in the 
introduction, the transformation is a modification of the general compression technique of McGregor et al. 

Hoi. 

Consider a generic private distributed protocol Gen PROT in which n users are communicating with an 
untrusted server. For any n G N, the protocol follows the following general steps. As before, each user 
i G [n] has a data point Vi that lives in some finite set V = [d\. Let Q, : VU{_L} —y Z be any e-local 
randomizer of user i G [n]. We assume, w.l.o.g., that Q t may also take a special symbol _L as an input. 
Each user runs its e-local randomizer Q,; on its input data v t (and any public randomness in the protocol, if 
any) and outputs a report z t . For simplicity, each report z, is assumed to be a binary string of length r. Let 
stat : V n —> C be some statistic that the server wishes to estimate where C is some bounded subset of 
for some integer k > 0. The server collects the reports {z, : i G [n]} and runs some algorithm *4 s tat on the 
users’ reports (and the public randomness) and outputs an estimate stat G C of stat (tq,..., v n ). 

We now give a generic construction 1-Bit-GenPROT for protocol GenPROT where each user’s report 
is one bit (See Algorithm[6]below). 

Algorithm 6 1-Bit-PROT: e-LDP Generic 1-Bit Protocol 

Input: Users’ inputs {vi G V : i G [n]} and a privacy parameter e < ln(2). 

1: Generate n independent public strings y\ G- Q| (_L),.... y n G- Q n (±). 

2: for Users i = 1 to n do 

q. Comnute «• - I 

g Gompuie Pt - 2 p r \Q i (±) =yi ] ■ 

4: Sample a bit bi from Bernoulli (pi) and sends it to the server. 

5: Reports •<— 0. {Server initialize the set of collected reports.} 

6: for i = 1 to n do 

7 : Server checks if b t = 1, add y, to Reports. 

8: stat g- Astat (Reports). {Run algorithm *4 sta t on the collected reports to obtain an estimate of the 
desired statistic as described in the original protocol GenPROT.} 

9: return stat. 


Note also that the only additional computational cost in this generic transformation is in Step [3] If 
computing these probabilities can be done efficiently, then this transformation preserves the computational 
efficiency of the original protocol. 

Theorem 4.1 (Privacy of 1-Bit-PROT). Protocol 1-Bit-PROT given by Algorithm's e-LDP. 
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Proof. Consider the output bit bi of any user % E [nl. First, note that p t (in Step [5]) is a valid probability since 
for any item tq E V, the right-hand side of Step 3 is at most // by e-differential privacy of Qi, and since 
e < ln(2), pi < 1. For any v E V and any public string y r , let pf/v. y r ) denote the conditional probability 
that bi = 1 given that Qj(_L) = yt when the item of user i is v. Let v, v' E V be any two items. It is easy to 
see that which lies in [ e~ € , e £ ] by e-differential privacy of Q,. One can also verify 

that a j SQ Jj es j n Tg-e g£l □ 

One important feature in the construction above is that the conditional distribution of the public string 
Hi given that bi = 1 is exactly the same as the distribution of Qi{vi), i.e., Pr [Qj(-L) = m\ bi = 1] = 
Pr [Qi(vi) = y-;], and hence, upon receiving a bit b, = 1 from user i, the server’s view of y, is the same 
as its view of an actual report Zj <— Qi(vi) as it was the case in the original protocol. 

We note that the probability that a user i E [n] accepts (sets b, = 1) taken over the randomness of //, is 


1 Pr [Qi ( Vi ) = y\ 

2 y P r [Qi(-L) = y\ 


■ Pr [Qj(-L) = y\ 


1 

2 ' 


Key statement: The two facts above show that our protocol is functionally equivalent to: first, sampling a 
subset of the users where each user is sampled independently with probability 1/2, then running the original 
protocol GenPROT on the sample. Thus, if the original protocol is resilient to sampling, meaning that its 
error performance (with respect to some notion of error) is not essentially affected by this sampling step, 
then the generic transformation given by Algorithm [6] will have essentially the same error performance. 

We now formalize this statement. Let : C x C —> [0, oo] be some notion of error (not necessarily a 
metric) between any two points in C. For any given set of users’ data { v i,..., v n }, the error of the protocol 
GenPROT is defined as 

(stat; GenPROT (tq,..., v n )) = (stat(iq, ..., v n ), stat) (5) 

Let Samp be a random sampling procedure that takes any set of users’ data {iq,..., v n } and constructs 
a set S by sampling each point Vi, i E [n] independently with probability 1/2. We say that GenPROT is 
sampling-resilient in estimating stat with respect to ^ if for any set of users’ data (iq,..., v n ) E V n and any 
/3 > 0, whenever 

E^j (stat; GenPROT (iq, ...,v n )) = O (g (n, d, k, e)) 

for some non-negative function g with probability at least 1 — /3 over all the randomness in GenPROT, then 
8^ (stat; GenPROT (Samp (tq, ...,v n ))) = O (g (n, d, k, e)) 
with probability at least 1 — 2 (3 over all the randomness in GenPROT and Samp. 

Theorem 4.2 (Characterization of error under sampling-resilience). Suppose that for any set of users’ data 
and any j3 > 0, the original protocol GenPROT has error (with respect to f) that is bounded by some 
non-negative number g = g(n. d, k, e) with probability at least 1 — (3 over the randomness in GenPROT. If 
GenPROT is sampling-resilient in estimating stat with respect to f, then construction 1-Bit-PROT yields 
error (with respect to fi) that is O (g(n, d, k, e)). 
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4.2 Efficient Construction of Succinct Histograms with 1-Bit Reports 

We now apply the generic transformation discussed above to our efficient protocol PROT-S-Hist (Algo- 
rithm [5]) to obtain an efficient private protocol for succinct histograms with 1-bit reports and optimal error. 

Computational efficiency: To show that the protocol remains efficient after this transformation, we argue 
that the probabilities in Step [3] of Algorithm [6] can be computed efficiently in our case. The overall e- 
local randomizer Q pLl11 at each user i over all the KT + 1 parallel channels in PROT-S-Hist is described 
in Algorithm [7] Note that given the user’s item »i and the seed of the hash, the KT + 1 components of 
Q pul (?,>,;) are independent. Moreover, note that (K — 1 )T of these components have the same (uniform) 
distribution since each user gets assigned by the hash function to only T + 1 channels and in the remainder 
channels the user’s report is uniformly random. Hence, to execute Step [3] of Algorithm [6j each user in 
our case only needs to compute T + 1 probabilities out of the total KT + 1 components. This is easy 
because of the way the basic randomizer 7 Z works. To see this, first note that each y, (referring to the 
public string y, in Algorithm |6j) is now a sequence of (index, bit) pairs: (j\. b Jt ),..., (jxT+i, bj KT+1 ). 
To compute the probability corresponding to one of the T + 1 item-dependent components of Qf ull (ty), 
each user first locates in the public string yi the pair {j, b ) corresponding to this component. Then, it 
compares the sign of the j-th bit of the encoding of its item v t with the sign of b. If signs are equal, 
then the desired probability is jfht, otherwise it is ypp?- Hence, the computational cost of this step (per 
user) is O (T log (mpp) + log (m-Fo)) = 0(log (log(d)) log(l//3) + log(n)) where mpp is the length of 
the encoding c(ty) used in the promise problem protocol PROT pp -S-Histpp and mpo is the length of the 
encoding cf) Vi used in the frequency oracle protocol PROT-FO. Thus, at worst the overall computational 
cost of the 1-Bit protocol is the same as that of protocol PROT-S-Hist. 


Algorithm 7 Q pul1 : e-Local Randomizer of User i in PROT-S-Hist (Algorithrrj5 i 
Input: item Vi € V, privacy parameter e, seeds of Hash si, ..., st- 
1: for t = 1 to T do 
2: for Channels k = 1 to K do 

3: If Hash(s t , V|) / k, set zf' k ^ = TZ t (0, e). Else, set zf ,k ^ = IZi (c (vf) , e). { zf’^ denotes the 

report of user i in the k- th channel in the f-th group.} 

4: Set zj^ o) = IZi (cA, ; , e). {4> Vi is the v;,-th column of $ the encoding matrix in the construction of the 
frequency oracle FO.} 

5: return z* = (zf’ k \ z^ o) : t = 1 ,...,T;k = l,...,Kj. 


Our 1-Bit Protocol gives the same privacy and error guarantees of PROT-S-Hist. 

Theorem 4.3 (Privacy of the 1-Bit Protocol for Succinct Histograms). The 1-Bit Protocol for succinct 
histograms is e-differentially private. 


Proof. The proof follows directly from Thcorems |3.2| and |4. 1 [ □ 

Theorem 4.4 (Error of the 1-Bit Protocol for Succinct Histograms). The 1 -Bit Protocol for succinct his¬ 


tograms provides the same guarantees of Protocol PROT-S-Hist given in Theorem 3.2. 


6 This encoding is either c(vi) or <f> Vi depending on whether we are at Step[3]or Stepjljof Algorithm|7] 
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Proof. The proof follows from Theorem 4.1 and the fact that Protocol PROT-S-Hist is sampling-resilient. 
Note that for any f3 > 0, any n, and any set of users’ items {m,..., v n \, PROT-S-Hist satisfies the error 
guarantees of Theorem 4.1 with probability at least 1 — (3. Thus, sampling a subset of users’ items (where 
each item is picked with probability 1/2), then feeding this set to PROT-S-Hist will only lead to an extra 


eiTor term of 0(1/ yfn) which will be swamped by the original error term of O i j. 


□ 


5 Tight Lower Bound on the Error 


In this section, we provide a lower bound of 0 ^ ! J on the error of frequency oracles and succinct 

histograms under the (e, 4)-local privacy constraint. Our lower bound is the same for pure e as for (e, 4)- 
LDP algorithms when 5 = o ^ n ; j ■ Namely, our lower bound shows that there is no advantage of 
(e, 4) algorithms over pure e algorithms in terms of asymptotic error for all meaningful settings of 4. In fact, 
it is standard to assume that 4 = o(/), say 4 « > 2 , since otherwise there are trivial examples of 

algorithms that are clearly non-private yet they satisfy the definition (e, 4) differential privacy (for example, 
see HI Example 2]). 

Our lower bound matches the upper bound (for both frequency oracles and succinct histograms) dis¬ 
cussed in previous sections. Hence, the efficient constructions given in Sections |2.2[ [3| and |4.2| yield the 
optimal error. Our lower bound also shows that some previous constructions yield the optimal error, namely, 
the constructions of lfl4l and 11211 . However, as discussed in Section[2| those constructions are computation¬ 
ally inefficient when used directly to construct succinct histograms. 


Our Technique: Our approach is inspired by the techniques used by Duchi et al. 0 to obtain lower bounds 
on the statistical minimax rate (expected worst-case error) of multinomial estimation in the pure e local 
model. In a scenario where the item of each user is drawn independently from an unknown distribution 
on V, we first derive a lower bound on the expected worst-case error (the minimax rate) in estimating the 
right distribution. We then show using standard concentration bounds that this implies a lower bound on 
the maximum error in estimating the actual frequencies of all the items in V. To obtain a lower bound 
on the minimax rate, we first define the notion of an //-degrading channel which is a noise operator that, 
given a user’s item as input, outputs the same item with probability 77 , and outputs a uniform random item 
from V otherwise. We compare two scenarios: in the first scenario, each user feeds its item first to an 77 - 
degrading channel, then feeds its output into its (e, 4) local randomizer to generate a report, whereas the 
second scenario is the normal scenario where the user feeds its item directly into its local randomizer. We 
then argue that a lower bound of 12 ( 1 ) on the minimax error in the first scenario implies a lower bound of 
f)(? 7 ) i n the second scenario. Next, we show that a lower bound of 12(1) is true in the first scenario with 

an ? 7 -degrading channel when 77 = 12 ^ , which gives us the desired lower bound. To derive 

the 12(1) lower bound in the first scenario, we proceed as follows. First, we derive an upper bound of 
O (e 2 + | log(de/4)) on the mutual information between a uniform random item V from V and the output 
of an (e, 4)-local randomizer with input V. Then, we prove that the application of an 77 -degrading channel 
on a user’s item amplifies privacy, namely, scales down both e and 4 by 77 . This implies that in the first 
scenario above with an 77 -degrading channel, the mutual information between a uniform item from V and 
the output of the (e, 4)-local randomizer is O (iff 2 + - log(de/4)). We use such mutual information bound 

together with Fano’s inequality to show that for 77 = 12 ( 7 \j Io ) • the error in the first scenario is 12(1). 
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5.1 A Minimax Formulation 


Notation and definitions: Let simplex(d) c [0, l] d denote the probability simplex of d corner points. Let 
V E simplex(d) be some probability distribution over the item set V = [o(Q. Users’ items v t , i E [n] are 
assumed to be drawn independently from the distribution V. For every i E [n] , let Qi : V -» Z be any 
(e, (i)-local differentially private algorithm ((e, d) -local randomizer) used to generate a report z, E Z of user 
i where Z is some arbitrary fixed set. All Qi,i E [n] use independent randomness. Hence, all z,;, i E [n] 
are independent (but not necessarily identically distributed). Let A : Z n —> [0, l] d be an algorithm for 
estimating V based on the observations zj,..., z n . Let V E [0, l}' 1 denote the output of A. The expected / JOC 
estimation error for a given input distribution V and an estimation algorithm A is defined as 


£(V-,A)±E[\\A(*i, 


r)~V\ 


= E 


max \V v — V v 
«e[d] 


where the expectation is taken over the distribution V, the randomness of Q,, i E [n], and randomness (if 
any) of A. 

The minimax error (minimax rate) is defined as the minimum (over all estimators A) of the maximum 
(over all distributions V E simplex(d)) error defined above. That is, 

MinMaxError = min max £(V,A). (6) 

A. 'PEsimplex(d) 


Let B : Z n [0, 1 d be an algorithm for estimating the frequency vector f = - Yli= t e v t of the items in V 
where e v is the o-th standard basis vector in M d . Let f E [0, 1]' / denote the output of B. The eiTor incuiTed 
by B is given in Section [T] that is, 

ERR(f ;B) = ||f - f ||oo = nmx | f(v) - f(v) |. (7) 

ve[<t] 


We first provide a lower bound of H ^ \J log f ^ J 

implies a lower bound of the same asymptotic order on 
result in this section. 


on (61 and argue using Hoeffding’s bound that this 
the expectation of ([7]) which clearly proves our main 


Lemma 5.1. For any e = 0(1) and 0 < 5 < o n ] 0 g( n ) j■ Bor any sequence Qi,i E [n] of(e,5)- LDP 
algorithms, the minimax rate satisfies 


MinMaxError 



where MinMaxError is defined in (|6]). 

The above lemma is the central technical part of this section. The proof of this lemma is given separately 
in Section [53] First, we formally state and prove our lower bound in the following section. 

’Without loss of generality, we will use V and [d] interchangeably to denote the item set 
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5.2 Main Result 


Theorem 5.2 (Lower Bound on the Error of Private Histograms). For any e = 0(1) and 0 < 5 < 
o ( nlog(n) ) • For any sequence Qi : V —y Z,i E [n] of (e, <5)-LDP algorithms, and for any algorithm 

B : Z n —y [0, l\ d , there exists a distribution V E simplex(d) (from which Vi,i E [n] are sampled in i.i.d. 
fashion) such that the expected error with respect to such V is 


where ERR(f; B) is defined in ([7|). 


E [ERR(f; B)\ = ft min 


Proof. First, note that for the case where n < 


iog(Q 


, the above theorem follows directly from Lemma 


5.1 


since (as given in the proof of this lemma in Section 5.31 our example distribution V will simply be a 
degenrate distribution and hence f = V with probability L _ 

in hand, the proof of Theorem : 


Turning to the case where n > 


logO) 


, having Lemma 


5.1 


5.2 


in this case 


becomes a simple application of Hoeffding’s inequality. First, fix e, 5 as in the theorem statement and let 
Qi : V —> Z, i E [n] of (e, d)-LDP mechanisms. Suppose, for the sake of contradiction, that there is an 
algorithm B : Z" —y [0, l ' / such that for any distribution V E simplex(d) from which the users’ items 

l 


v\, .... v n are sampled in i.i.d. fashion, the error E [ERR(f; B)\ ft 
of B. Now, observe that for sufficiently large n and d, we have 


log(0 


. Let f denote the output 



r 



ns [ 

E 


< E 

Ilf — fll 

|| 1 1 ||oo 

+ E[f — Poo] <E [ERR(f; B)] + y 


£(:P;B) = E 

where the last inequality in (|8]) follows from using Hoeffding’s inequality and the fact that 


log (d) 


n 


( 8 ) 


E[||f-P||oc] = J Pr[||f-P||oo >t\dt. 

t=0 


Hence, £(V\B) / H 


iog(cO ) 
e 2 n ) 


. However, this contradicts Lemma 


5.1 


Therefore, the proof is complete. 

□ 


5.3 Proof of Lemma 15.11 

We first introduce the notion of an ^-degrading channel. For any // E [0,1], an //-degrading channel W™ : 
V —y V is a randomized mapping that is defined as follows: for every v E V, 

w(r?) f v with probability rj 
' \ U, with probability l — rj 

where U is a uniform random variable over V. 

Let MinMaxError, ; be the minimax eiTor resulting from the scenario where each user i E [n] with item 

(ri) 

Vi E V, first, apply Vi to an independent copy W) of an //-degrading channel, then apply the output to its 
(e, (f)-LDP randomizer Q, that outputs the report z,. That is, MinMaxError^ is the minimax error when 
Q, (-) is replaced with Q, ( W ^ (•) J, i E [n]. Our proof relies on the following lemma. 
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Lemma 5.3. Let rj E [0,1]. If MinMaxError < LL, then MinMaxError,^ < 


Proof. Suppose that MinMaxError < Then, there is an algorithm A : Z n —> [0, l] d such that for any 
distribution V E simplex(d) on V, we have 


E 


M(Ql(vi),...,Qn(Vn))-^ 


<TL 

~ 10 


( 10 ) 


where Vi is drawn from V independently for every i E [n]. 

Let denote the distribution of the output of the ^-degrading channel Note that if the distri¬ 

bution of the input of is V, then 


W (??) = rfP + (1 - r])U 


( 11 ) 


where U is the uniform distribution on V. 

Consider an algorithm A v defined as follows. For any input (zi,..., z n ) E Z n , A v runs A on its input 
to obtain V E [0, l] d . Then, A v computes a vector V^> E [0, l] d whose entries vi n \ v E [d], are given 
by (v v — (1 — rounded to [0,1], Now, consider the scenario where we replace each Q,(•) with 
Qi (WW (•)) for all i E [n]. Observe that for any distribution V of users’ items, we have 


£{V,A v ) = E \\\V M -V 


= X E 
V 


< E II- [V - (1 - 
ri 


M(Q 1 (y 1 ),...,Qn(i/n))-wW|| c 


< 


— P l|oO 

1 

To 


= -E 

v 


\\V-W M \\ C 


( 12 ) 

(13) 


where r/j is drawn from independently for every i E [n\. Note that the last equality in (12) follows 
from ( fTT| ), and ( p~3] ) follows from ( fTT)] ). □ 


Given Lemma 


5.3 


our 


proof proceeds as follows. We show that for a setting of rj = Q ( min 


we have MinMaxError^ > which, by Lemma 


5.3 




implies that MinMaxError = O ( min 


log(d) 

^2 T) 5 


log(rf) ! 

-7ZZ-, 1 


which will complete our proof. 

We consider the following scenario. Let V be a uniform random variable on V. Conditioned on V = 
v, for all i E [n], i;,; = v, i.e., all users have the same item v when V = v. Each user i applies an 
independent copy of an ^-degrading channel W]' ,! to its item v l . The output is then fed to the user’s (e, <))- 
local randomizer Q t that outputs the user’s report z,. Let Q : Z" —>• V be an algorithm that, given the users’ 
reports zi, ..., z n , outputs an estimate V for the common item V. 

Fano’s inequality: Let P er ror (G) be the probability of eiTor that Q outputs a wrong hypothesis V f V. That 

is, 

P error (G) = Pr [^(zi,...,z n ) / V] . 


Fano’s inequality gives a lower bound on the probability of error incurred by any such estimator Q: 

I(v 1 , —,v n ] zi, ...z n ) + 1 


Pmin—error = min P error (^) > 1 - 


log (d) 


(14) 
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One can easily see that the minimax error MinMaxError j; is bounded from below as 


MinMaxError, ; > min 

v^v' 


|6^ 


P ■ — P • 

1 min— error — 1 min— 


min—error 


( 15 ) 


eKP ’ 1 ) )’ we 


To reach our goal, given (14 1 -( 15 l, it suffices to show that for a setting of ry = Q (min ^ • / log ^ 
have / ^ 1 ’ 'lo'gff’" Zn ) 5; 2 - This will be established using the following claims. 

Claim 5.4. Let V be uniformly distributed over [d]. Let Q : [d] —>• Z be an (e, 4) -LDP algorithm and let 
Z denote Q(V). Then, we have 


I(V;Z) = O [ e + - log(d) + - log(e/<5) 


Proof. Let M v denote the probability density function of the output of Q(v), v £ [c/]j Let M(z) = 

2 J2 z £ Z. For every v £ [d], define 

ue[d] 




and 




Let Bad v = Bad^ U Bad[ 2) and Bad = U ve ^] ^Bad^ U Bady 2) j. Let B be a binary random variable that 
takes value 1 whenever Z £ Bady and 0 otherwise. Now, observe that 


I(V- Z)<I (V; Z,B) < I (V- Z\B) + H(B) 

< I ( V ; Z\B = 0) + I ( V ; Z\B = 1) Pr [Bad] + H(B) 


(16) 


where H(B) denotes Shannon entropy of B. The above inequalities follow from the standard properties of 
the mutual information between any pair of random variables and the fact that Pr[/i = 1] = Pr[Bad]. 


First, we consider the first term in ( 161 . Conditioned on B = 0, Z lies in a set j 2 £ Z : e 2e < < e 2e 


M{z) 


Hence, we can obtain a bound of 0(e 2 ) on I (P; Z\B = 0) by applying techniques that were originally used 
for pure e local differential privacy like those in 0 (Corollary 1 therein). 

Next, observe that / (P; Z\B = 1) < H(V) < log(d). Thus, it remains to bound PrfBad] (and conse¬ 
quently bound H(B)). Observe that 


Pr 


Bad 


(i) 


L 


zEBad^ 


> e l 


L 


zEBad^ 


M v (z)dz > e 2e f M(z)dz 

J zEBady 1 ) 

M v (z)dz — e € 6 = e e Pr 


Bady 1 ^ 




“To avoid cumbersome notation, we will ignore irrelevant technicalities of the measure defined on Z as well as the distinctions 
between probability mass functions and density functions and will use the notation J M v ( z)dz to simply mean Pr [Z £ <S| V = u] 

zE«S 

whether Z is discrete or continuous random variable. 
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where the last inequality above follows from the fact that Q is (e, <5)-differentially private. Hence, we get 


Pr 


Badi 1} 


e e 5 

< —r = 0(5/ e). 


Similarly, we can bound Pr 


Bady 2) 


e £ - 1 

< = 0(5/e). Hence, Pr[Bad] = O (5/e) which gives us the 

of 

□ 


required bound on the second term of 16 Finally, note that the bound on Pr[Bad] implies a bound of 
O log(e/<5)) on H(B). This completes the proof. 


Claim 5.5 (Privacy amplification via degrading channels). The composition Q (WO) (•)) of an p-degrading 
channel W (,/ ) (defined in ([ 9 ])) with an (e, <5)-LDP algorithm Q yields a (0(rje), 0(/;5))-LDP algorithm. 

Proof. Fix any measurable subset S C Z. For any wGV, let M^\s) denote Pr \Q (W^(n)) G 5] and 
M. V (S) denote Pr [ Q(v ) G 5]. Fix any pah - v , v' G V. Observe that 

M$\S) = rjM v (S ) + (1 - V)] E 


uev 


and 


= t]M v >(S) + (1 - rf) 1 - Y, M U (S). 


u&V 


Hence, we can write A iJfS) as 


M$>(S) = V (MAS) - M V (S))+MW(S) 
<r,(e e - 1)M V (S) + M ,?) (S) + ¥ 
< (1 + ?ye £ (e e -l )) M^(S) + e e r,5 


(17) 


The last inequality follows from the fact Q is (e, 5 )-differentially private, hence, 

M V (S ) < e^ yv _ wM(v) [M y (S)] + 5 = e e MW(S) + 5. 

We conclude the proof by noting that 1 + r/e e (e e — 1) = e°^ and e e p5 = 0(r]5 ) since e is 0(1). □ 

Putting these claims together with the fact that I(v 1 ,..., v n \ zi, ...z n ) < Y^i=i I{ v u z *)> we reach that, 
for 5 = o 


n log(n) 


, we have 


I(vi, Vn] zi, = / nrj 2 e 2 1 1 

log(d) Vog(d) log(n) log(d) 


which, by an appropriate setting of rj = 0 ( min 
completes the proof of Lemma [5TT| 


lo f^ . 1 ) ), can be made smaller than 1/2. This 
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